AN ANALYSIS OF THE NAVAL PERSONNEL 
PAY PREDICTOR (ENLISTED MODEL) 



Allan Ray Walker 






Ji'^^^OXUBRAny 



Monterey, California 





AN ANALYSIS OF THE 
NAVAL PERSONNEL PAY PREDICTOR 
(ENLISTED MODEL) 

by 

Allan Ray Walker 
September 1975 

Thesis Advisor: R.W. Butterworth 



Approved for public release; distribution unlimited. 






UNCLASSIFIED 

SECURITY CLASSIFICATION OF THIS PAGE fMTi.n Dmtm Enttrtd) 



REPORT DOCUMENTATION PAGE 


READ INSTRUCTIONS 
BEFORE COMPLETING FORM 


1. REPORT NUMBER 


2. GOVT ACCESSION NO. 


3. RECIPIENT'S CATALOG NUMBER 


4. TITLE (mnd Subllllt) 

An Analysis of the Naval Personnel Pay 
Predictor (Enlisted Model) 


5. TYPE OF REPORT A PERIOD COVERED 

Master's Thesis; 
September 1975 






6. PERFORMING ORG. REPORT NUMBER 


7. authorc*; 

Allan Ray Walker 


8 . CONTRACT OR GRANT NUMBER('«; 


9. PERFORMING ORGANIZATION NAME AND ADDRESS 

Naval Postgraduate School 
Monterey, California 93940 


10. PROGRAM ELEMENT, PROJECT. TASK 
AREA & WORK UNIT NUMBERS 


M. CONTROLLING OFFICE NAME AND ADDRESS 

Naval Postgraduate School 
Monterey, California 93940 


12. REPORT DATE 

September 1975 


13. number of pages 

52 


14. MONITORING AGENCY NAME & ADDRESS(J/ different from ControfIJng Office) 


15. SECURITY CLASS, (ol thim rip^n) 

Unclassified 






ts«. DECLASSI FI cation/ down GRADING 
SCHEDULE 


16. DISTRIBUTION STATEMENT fo/ K«pofO 






Approved for public release; 


distribution 


unlimi ted. 


17. DISTRIBUTION STATEMENT (of the ebetrect entered in Block 20, if different from Report) 


18. supplementary NOTES 


19. KEY WORDS (Continue on reveree eide if neceeemry and identify by block number) 

Enlisted Pay Predictor 


20. ABSTRACT (Continue on reveree eide if neceeemry mnd identity by biock ntxmber) 

The Naval Personnel Pay Predictor (Enlisted Model) is 
used by the Bureau of Naval Personnel as a tool for predicting | 

the total annual basic pay for the enlisted force as an input 
to the budget process. A major source of error in the model 
was found to be the prediction of the length of service (LOS) 
vector, and an attempt to improve this prediction was made. 

The extreme complexity of the model was found to be unnecessary, 



DD ,:°r73 1473 

(Page 1) 



EDITION OF 1 NOV 65 IS OBSOLETE 
S/N 0102-014- 6601 | 



UNCLASSIFIE D 

SECURITY CLASSIFICATION OF THIS PAGE 



Dsf £nt0rm<O 



1 




UNCLASSIFIED 

JliCUWiTV CL ASSl FlC T ION QP THIS PAG£rw^»n Entmrmd 



(20. ABSTRACT Continued) 

and a simple exponential smoothing subroutine for LOS 
prediction did as well or better than the original model. It 
was also found that a double exponential smoothing subroutine, 
taking into account the trends in the force structure, would 
almost uniformly improve the one year prediction from the model. 



DD Form 1473 
, 1 Jan 73 

S/N 0102-014-GG01 



UNCLASSIFIED 



2 



security classification of this PAOEf»h.r> D»(» Fnftmd) 



An Analysis of the 
Naval Personnel Pay Predictor 
. (Enlisted Model) 



by 

Allan Ray ^^alker 
Lieutenant, United States Navy 
B.S., University of Louisville, 1968 
M.S., University of West Florida, 1970 



Submitted in partial fulfillment of the 
requirements for the degree of 



MASTER OF SCIENCE IN OPERATIONS RESEARCH 



from the 



NAVAL POSTGRADUATE SCHOOL 
September 1975 



UOflARY 

'^ORAOUATE school 
California o"'''o 



ABSTRACT 



The Naval Personnel Pay Predictor (enlisted Model) is 
used by the Bureau of Naval Personnel as a tool for pre- 
dicting the total annual basic pay for the enlisted force 
as an input to the budget process. A major source of error 
in the model was found to be the prediction of the length 
of service (LOS) vector, and an attempt to improve this 
prediction was made. The extreme complexity of the model 
was found to be unnecessary, and a simple exponential 
smoothing s'ubroutine for LOS prediction did as Well or 
better than the original model. It was also found that a 
double exponential smoothing subroutine, taking into account 
the trends in the force structure, would almost uniformly 
improve the one year prediction from the model. 



4 



I 




TABLE OF CONTENTS 



I. NATURE OF THE PROBLEM ' 

II. THE NAPPE MODEL ^ 

A. SMOOTHING THE LOS VECTORS OF THE INVENTORIES - ^ 

B. GENERATING THE PAY GRADE BY LOS MATRIX 11 

III. EXPERIMENTS WITH AND CHANGES TO THE NAPPE MODEL — 13 

A. MAJOR SOURCE OF ERROR 13 

B. SIMPLE EXPONENTIAL SMOOTHING 15 

1. The SMOTHY Sx±>routine 16 

2. Results of SMOTHY 19 

C. DOUBLE EXPONENTIAL SMOOTHING 20 

1. The SMOTH2 subroutine 21 

2. Results of SMOTH2 22 

D. THE MATRIX GENERATION PROCEDURE 24 

IV. CONCLUSION 26 

APPENDIX A 2 8 

APPENDIX B 32 

COMPUTER PROGRAM 37 

LIST OF REFERENCES 50 

INITIAL DISTRIBUTION LIST 51 



5 



LIST OF TABLES 



I. 


Comparison 


of 


II. 


Comparison 


of 


III. 


Comparison 


of 



NAPPE with Pure Mosteller — 
NAPPE and SMOTHY Predictions 
Absolute Percentage Errors - 



14 

19 

23 



6 



I. 



NATURE OF THE PROBLEM 



The Bureau of Naval Personnel requires many mathematical 
models for accurately predicting the structure of the future 
force. These models are used as tools to aid in planning 
decisions. Of special interest is the problem of costing 
the future force as a part of the budget submission 
procedure. 

The Bureau of Naval Operations determines the personnel 
requirements for the future force and passes these to the 
Bureau of Personnel for implementation. These requirements 
are presented in the form of quarterly pay grade vectors, 
that is, the nximber of people required in pay grade E-1, 

E-2, E-3, ..., E-9. Since the amount of pay received is 
dependent on the member's length of service, the problem of 
predicting the total cost of the force becomes complex. 

The specific problem considered was: given the future 

size of the force by pay grade, the past and present inven- 
tories, predict the total annual base pay of the force for 
future years. 
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II. THE NAPPE MODEL 



One model currently used for this purpose is the Naval 
Personnel Pay Predictor (Enlisted Model) , referred to as 
NAPPE. The model makes use of a data base consisting of 
three sets of quarterly inventories (pay grade by LOS Force 
matrices) for all years since 1957. The inventories are for 
United States Navy (USN) , United States Naval Reserve (USNR) , 
and Total Navy (TOTALNAV) , the svim of the two. 

The procedure is to, first, predict the future quarterly 
LOS vectors for the desired number of years into the future 
(up to ten) . This is a vector of the total number of people 
with length of service 1, 2, ..., 31 years. The methodology 
used for this prediction is discussed later in this section 
and in Appendices A and B. The LOS vector is then combined 
with the pay grade requirements vector to get the predicted 
force matrix. A discussion of this procedure is also included 
in this section. The cost of the force is then simply the 
multiplication of the straight line averaged (between suc- 
cessive quarters) number of people in each cell of the matrix 
with the pay scale for that cell, which is an input to the 
model . 

A. SMOOTHING THE LOS VECTORS OF THE INVENTORIES 

The first step in the NAPPE models prediction of the LOS 
vector is accomplished by a subroutine referred to as SMOOTH 
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(refer to Appendix A for the mathematical model) . Throughout 
the discussion of SMOOTH it should be remembered that all 
calculations applied to previous years data are made inde- 
pendently for the population (the total number of people) in 
each element of the LOS vector (hence referred to as LOS cell) 
and the transition rates from one cell to the next, computed 
for all three data bases. The transition rate is simply the 
proportion of the population in cell i of year j which move 
to cell i+1 in year j+1. The methodology is basic single 
exponential smoothing as discussed by Brown [1] and others. 

The following procedure is done independently for each 
LOS cell. For each year of historical data, a prediction is 
made based on exponentially smoothing the data up to that 
year using values of the smoothing constant (hence referred 
to as alpha) of .05, .10, ..., .95. For each year, the pre- 
dicted value is then compared with the actual value to deter- 
mine which value of alpha would have given the best prediction. 
This results in the selection of an alpha for each LOS cell 
for each year of data. Consult Appendix A for the exact 
procedure and forms of the resulting error that are stored 
and used by the model. "Best" predictions and resulting 
errors are made for all years of historical data, finally 
resulting in a decision for the "best" alpha for predicting 
the future. 

The output of SMOOTH consists of four sets of LOS vectors. 
For each year of historical data, there is a prediction based 
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on transition rates and a prediction based on previous 
year's cell populations, one pair based on the TOTALNAV data 
and the other pair on the sum of the predictions based on 
USN and USNR data. Note that due to the difference in the 
structure of the USN and USNR, the sum of these predictions 
may be different than the prediction based on their sum. 

The final prediction is made by a siibroutine referred 
to as ADJSMO (refer to Appendix B for the model) . ADJSMO 
considers five "methods" for prediction. These include the 
four outputs from SMOOTH plus a weighted average of these 
predictions. This weighted average is formed by multiplying 
a weight (BWT) times the average of the two transition rate 
based predictions plus the complementary weight (1.0 - BWT) 
times the average of the two population based predictions. 

This calculation is made for values of BWT of .45, .50, ..., 

.95. For each year of data a "best" method (of the five), 
in the least square error sense, and a "best" weight, if a 
weighted average method was chosen, is selected for predicting 
that year. The absolute sum of the errors of the "best" 
prediction is also calculated for use in adjusting the final 
prediction. This adjustment is necessary because no transition 
rate if available to predict LOS cell 1. 

Having selected a "best" method and a "best" weighting 
factor based on the last year of historical data, the model 
predicts the first future year values for LOS cells 2-31. 

At this point the model calculates the average (over all 
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years of data) proportion of the total population that was 
in LOS cell 1. This proportion is then applied to the total 
force required for the quarter under consideration and com- 
pared with the number which would be in cell 1 given the 
predicted values for cells 2-31 and the required total. 

Half of the difference in these two values is then allocated 
among cells 2-31 according to the total absolute error dis- 
cussed above. The prediction for cell 1 is then the differ- 
ence between the required total size of the force and the 
predictions made for cells 2-31. 

B. GENERATING THE PAY GRADE BY LOS MATRIX 

The pay grade by LOS matrix is calculated using a method 
for renormalizing contingency tables, as described by Hosteller 
[Ref. 3] . This method is an iterative procedure which takes 
the desired marginal totals of a matrix and a given, or base, 
matrix of the desired form and constructs a matrix as similar 
as possible to the base matrix, having the marginal totals 
that were desired. 

Since this method was used throughout the research, a 
brief discussion of the procedure follows; 

Let A. . be the elements of the base matrix, i = 1,2,..., 9 
J j = 1,2, . . .,31 

Let R. be the desired row totals 
3 

Let C. be the desired column totals 
1 

9 

(1) R^ = S A. . for all j 

3 i=i 
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for all j 



D. = R./R! 
3 3 3 





CJ = 
1 



31 

E 

j=l 



A! . 
ifD 



D! = C./C! 
1 11 



A'.' . = D! a: . 

1/3 1 1/3 



for all i,j 



for all i 



for all i 



for all i,j 



A. . = A'.' . 

1/3 1/3 



Return to step (1) 



The procedure is continued until the row and column totals 
converge to the desired totals. 

In the NAPPE model, the marginals are the given pay 
grade vector and the predicted LOS vector. The base matrix 
is calculated as the simple average of the last twelve 
quarterly historic inventories . 
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III. EXPERIMENTS WITH AND CHANGES TO THE NAPPE MODEL 



Since the object of the research was to improve the 
predictive accuracy of the NAPPE model, the sources of error 
had to be determined. It appeared that there were two inde- 
pendent sources of error, predicting the LOS vector and the 
instability of the Mos teller procedure for completing the 
matrix. Both of these possible problem areas were studied. 

In this section is a discussion of the first area and changes 
which were made to the model to improve its predictive quality. 
This is followed by a discussion of a study to discover the 
factors which influence the Hosteller procedure. 

A. MAJOR SOURCE OF ERROR 

The removal of either of these above mentioned sources 
of error should improve the predictions of the model. An ad 
hoc test of this hypothesis was accomplished by using the 
Hosteller subroutine (called PNGPNG) with the correct LOS 
vector and comparing the results of the model with known 
values, for years with historical data available. 

The NAPPE model has a validation feature which facilitates 
this and other kinds of comparisons . As an input to the 
model, the last date of historical data to be used is given. 

The model then only looks at data up to that date and predicts 
as if that were today's date. Also included in the NAPPE 
package (which consists of several minor models besides 
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NAPPE itself) is a model called NAPVAL. This model compares 
the NAPPE output with the actual inventories. These compari- 
sons are. discussed throughout this paper. Specifically, any 
number called "actual" will mean an output from NAPVAL. Also, 
throughout the paper, the measure of effectiveness for com- 
parison will be the total annual cost of the force, which 
is an output of both models. 

In order to accomplish the above mentioned objective, 
the SMOOTH and ADJSMC subroutines were removed from the model. 
In their place, the actual LOS vector was read from the inven- 
tories and the following table is the result of comparing 
the prediction based on this procedure and the prediction 
of NAPPE. The elements of the table are the actual cost 
(NAPVAL) , the NAPPE prediction (with the model untouched) , 
and the prediction using only the Hosteller procedure (NAPPE 
with SMOOTH and ADJSMO removed) , labeled "Using Actual LOS" . 



TABLE I 

COMPARISON OF NAPPE V'JITH PURE HOSTELLER 
(costs are in millions of dollars) 



Year 


Actual 

Cost 


NAPPE 

Prediction 


% 

Error 


Using 

Actual Los 


% 

Error 


1968 


1,832 


1,839 


.363 


1,832 


.014 


1969 


2,009 


2,016 


.372 


2,010 


.061 


1970 


2,280 


2,286 


.262 


2,282 


.081 


1971 


2,264 


2,258 


.290 


2,266 


.092 
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These results indicated that a major source of error in the 
model, and hence a potential for improvement, results from 
the prediction of the LOS vector, as was expected. If one 
looks at any feature of the enlisted force (such as size or 
distribution) , he finds that it is not stationary in time, 
even considering statistical fluctuations. There are obvious 
trends. During war years, the force becomes larger and, on 
the average, younger, while during peace time, the force 
becomes smaller and older. Since single exponential smoothing 
does not allow for trends , it could not be expected to handle 
the problem being considered. 

However, before attacking this problem, there were other 
questions to be answered. After docimenting the model, two 
other questions came to mind. Is the pure complexity of the 
model worth the computer requirements? (The following section 
indicates not.) Is the use of the entire data base justi- 
fied? Intuitively, the answer to the second question was no. 
The size and structure of the force in the late 1950 's is 
not indicative of the force in 1975. There are continual 
policy changes which affect enlistment, promotion, and 
retention . 

B. SIMPLE EXPONENTIAL SMOOTHING 

In order to answer these questions, the first change in 
the model was made. The SMOOTH and ADJSMO subroutines were 
removed and a subroutine, SMOTHY, replaced them. This 
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sxibroutine used simple exponential smoothing of the transition 
rates and only the TOTALNAV data base. 

1 . The SMOTHY Subroutine 

This subroutine results in an extensive simplifica- 
tion of the NAPPE model as it uses only four years of 
historical data and a single alpha value of 0.4. This alpha 
value may seem very large, but it was desired to make the 
prediction extremely dependent on the most recent data, which 
is the most significant. The actual subroutine is included 
at the end of the paper but the simple mathematical model 
follows : 

Let the number of people in LOS cell k 

in quarter j of year i 

For the four years of historical data calculate the 

loss rate for each quarter and each LOS independently 

tr = \,i,k \+l,i,k+l 



The following series of data were smoothed for prediction 

..., TR« . .. TR. — ., ..., TR* . TR... . ...... 

1,1, k i,2,k i,4,k' 1+1, l,k' 

For each LOS calculate the annual loss rate for prediction 
(P. , ) using single exponential smoothing. The procedure 

1 , K 

is to iterate through four quarters of data which results 
in a single value for each year. The superscript (j) , 
j = 1,2, 3,4, is used to indicate the intermediate steps of 
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the procedure but need not be carried once the annual 
predicted loss rate has been calculated. 



P 



( 1 ) = 
i,k 



aTR 



i,l,k 



+ 



( 1.0 - a) 



^i-l,k 



Where 



P. - , is the final result from the previous year. 

X X / K 



= aTR. . , + (1.0 - a) pf\^^ for all j = 2,3,4 

i,k i/D/k i,k ' 



P. , = pf'^J. 

1 ,k 1 ,k 



Note that only one loss rate prediction is made for each 
year. This means that seasonal variations in the loss rate 
are not taken into account. Another approach would be to 
predict a loss rate separately for each quarter, the trade- 
off being that this procedure would require more years of 
data. This raises the question of whether using older data 
which takes into account seasonal variations would result 
in a better prediction than not using this older data but 
ignoring the seasonal variation. This is an area left for 
further study. 

The prediction is now made for each quarter. 

Let T. . , be the prediction for year i (first future 
year) quarter j and LOS k 
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T- ■ u 

1/ D fk 



^i-1, j ,k-l 



( 1.0 




for all 
j = 1,2, 3, 4 



As with the NAPPE model, this gives predictions for LOS 
cells 2-31. The calculation for cell 1 was done in a manner 
similar to the existing NAPPE model. The average proportion 
of the total force in cell 1 was calculated for the four 
years of data. For each quarter the following calculations 
were made: 

Let ClAV be the number which would be required in 
cell 1 calculated by taking the above proportion 
of the total force requirement for the quarter 
being predicted. 

31 

Let CIP = Req - S T. . , total required minus 

k=2 ^'3, Qf cells 2-31 

Half of the difference between these two values was 
then allocated among cells 2-31 on the basis of the 
mamber projected for that cell. 

For each cell calculate 

(ClAV - ClP) "^i, j ,k 
2 31 

Z T. . , 
k=2 



AD J . . , 

i,3,k 



T' = T + ADJ 

i,j,k i,j,k i,j,k 
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The value for cell 1 is then the difference between the 
total required and the sum of the new predictions for cells 
2-31. 

2. Results of SMOTHY 

The model, as described above, was then run to 
obtain one year predictions for the last ten years. The 
following table is a comparison of these results with 
outputs from NAPPE for the same time periods. 



TABLE II 

COMPARISON OF NAPPE AND SMOTHY PREDICTIONS 
(costs are in millions of dollars) 



Year 


Actual 

Cost 


NAPPE 

Prediction 


% 

Error 


SMOTHY 

Prediction 


% 

Error 


1965 


1,362.1 


1,360.0 


.151 


1,358.1 


.209 


1966 


1,582.4 


1,591.6 


.501 


1,582.0 


.028 


1967 


1,720.7 


1,736.5 


.918 


1,737.9 


.997 


1968 


1,832.6 


1,838.8 


.337 


1,836.9 


.237 


1969 


2,009.2 


2,016.7 


.372 


2,015.7 


.322 


1970 


2,280.8 


2,286.8 


.262 


2,280.9 


.001 


1971 


2,264.8 


2,258.2 


.290 


2,256.2 


.379 


1972 


2,496.7 


2,480.7 


.643 


2,483.6 


.524 


1973 


2,683.6 


2,678.1 


.205 


2,678.3 


.196 


1974 


2,777.4 


2,773.6 


.138 


2,772.1 


.190 






^fean 


.3817 




.3083 






Mean 2 


.3221 




.2318 



19 



The value given in the table as mean is the mean of the 
absolute errors and the value called mean 2 is the same 
except the outlyer (1967) is left out of the calculation. 
Leaving this value out is not unreasonable when the events 
of 1967 are taken into consideration. It was during this 
year that the structure of the force saw tremendous change 
due to the Viet Nam buildup. Any model based on past data 
cannot predict the future when major policy decisions make 
that data inappropriate. This is a point where the analyst 
using the model must use reason when looking at its output, 
a point to be discussed later. 

A close inspection of the preceeding table yields 
some surprising conclusions. Although the SMOTHY model does 
not predict uniformly better, it does significantly better 
for most years. This suggests that the complexities of the 
NAPPE model are not only unnecessary, but have a negative 
effect. 

C. DOUBLE EXPONENTIAL SMOOTHING 

The most important hypothesis tested was that single 
exponential smoothing is not the appropriate tool for modeling 
a time series which appears to have trends. As suggested by 
Brown [1], Goodman [2], and others, higher-order exponential 
smoothing is a valuable tool for modeling time series with 
underlying trends. Since the time series under consideration 
does not show any properties which would indicate anything 
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beyond a linear trend, only doxible smoothing was considered 
in SM0TH2. 

1. The SM0TH2 Subroutine 



makes use of only four years of historical data and only 

the TOTALNAV inventories. It also only smoothes the loss 

rates. These loss rates (TR. . , ) are calculated exactly 

1 / D 

the same as in SMOTHY, and the loss rate to be used for 
prediction (D. , ) is calculated as described by Brown [1] . 

1 ,K 

The single smoothed portion (P^ is calculated exactly as 
before. The double smoothing term is calculated by simply 
smoothing the single smoothed value. The superscript nota- 
tion is again used for the four iterations exactly as used 
to calculate P. , . 



These two values are then combined to pick up the 
linear trend and result in; 



As with the SMOTHY subroutine, the SM0TH2 subroutine 



i,k 






D 




S 



i,k 
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Note that again only one value of the smoothed loss 
rate is calculated for each year and the same procedural 
question' was left unanswered. 

The prediction is then made for each quarter into 
the future exactly as in the SMOTHY subroutine 



T. . , 



A. , . , T (1.0 

1-1,3 ,k-l' 




) 



As with the previously discussed models, this gives a 
prediction for LOS cells 2-31. The final adjustments used 
in SM0TH2 are exactly the same as used in SMOTHY. 

2. Results of SM0TH2 

As in the experiment with SMOTHY, it was obvious 
that the most recent data should be most heavily weighted. 
Therefore, an initial value of 0.4 was used for alpha. 

Since the impact of trend was the most important considera- 
tion of the research, other values of alpha were also tried. 
The following table is the result of these tests. In order 
to put the results in a form for analysis, the actual dollar 
values were not tabled but only the percentage errors . The 
elements of the table are the percentage error from the actual 
total cost of the force for NAPPE, NAPPE with the SMOTHY 
subroutine (these values are the same as Table II) , and for 
NAPPE with the SM0TH2 subroutine using values of alpha of 
0.2, 0.3, and 0.4. 
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TABLE III 



• COMPARISON OF ABSOLUTE PERCENTAGE ERRORS 
OF NAPPE, SMOTHY, AND SMOTH2 

SMOTH2 



Year 


NAPPE 


SMOTHY 


a=.2 


a=. 3 


a=. 4 


1965 


.151 


.209 


.242 


.223 


.215 


1966 


.501 


.028 


.015 


.027 


.018 


1967 


.918 


.997 


1.098 


1.158 


1.170 


1968 


.337 


.237 


.196 


.098 


.273 


1969 


.372 


.322 


. 133 


.033 


.054 


19 70 


.262 


.001 


.205 


.0 76 


.041 


1971 


.290 


. 379 


.443 


.423 


.473 


1972 


.643 


.524 


.510 


.527 


.543 


1973 


.205 


.19 6 


.171 


.164 


.147 


1974 


.138 


.190 


.132 


.068 


.059 


Mean 


.3817 


.3083 


.3145 


.2797 


.2993 


Mean 2 


. 3221 


.2318 


.2274 


.1821 


.2026 


Mean 3 


.2801 


.1690 


.1563 


.0984 


. 1153 



The values of mean and mean 2 have the same definition 
as in the preceeding table. The value mean 3 was calculated 
leaving out the values for 1967, 1971, and 1972. The reason 
for making this calculation will be discussed in detail later. 

The first overview of the table would result in a 
conclusion that SM0TH2 , with an alpha value of 0.3 seems to 
be a somewhat better model on the basis of the mean alone. 
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However, the mean is not the only significant feature. The 
most important observation is that SM0TH2 predicts extremely 
well for all years except 1967, 1971, and 1972. The reason 
for the poor prediction in 1967 has already been discussed. 
The reason for the poor prediction for 1971 and 1972 can 
be explained in the same manner except that the force 
structure was moving in the opposite direction. That is, 
these were the years of major policy changes resulting from 
the end of the Viet Nam War and the shrinking of the force. 

Because of the linear trend, which is a part of 
SM0TH2 , it must be expected to do poorly when the direction 
of the trend changes. This means that SM0TH2 should have 
more difficulty "turning the corner" when there are major 
policy changes. This does not mean that it is a poor model, 
but rather, some decision rule is required of the user when 
this occurs . 

D. THE MATRIX GENERATION PROCEDURE 

Although incomplete and inconclusive, a study of the 
Hosteller method, as used in this model, had some interesting 
results. It was fo\and that, in general, the Hosteller pro- 
cedure is extremely sensitive to the base matrix. It was 
found that changing the value in one cell of the base matrix 
resulted in changing the values in virtually every cell of 
the output matrix. There was no consistency found in these 
changes. The surprising result was that the changes in the 
output matrix were sometimes greater than the initial changes 
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in the base matrix. For example, changing one cell value in 
the base matrix by less than 5% could result in changes in 
the output matrix of greater than 5% in some cells. 

This is extremely significant v;hen considering its use 
in this model. The base matrix is calculated as a simple 
average of the last twelve quarterly inventories. This means 
that seasonal variations in the force structure are not 
taken into account and implies that a better base matrix 
may be possible. 

A hypothesis was made that a better base matrix could 
be calculated by computing some very rough transition rates 
from one cell to another and these rates applied to the one 
year previous inventory. Experiments with this hypothesis 
showed some very promising results but were inconclusive. 
Continued study in this area may be of considerable value. 
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IV. CONCLUSIONS 



The NAPPE model, in its present form of single exponen- 
tial smoothing, does not appear to be the appropriate model. 
Single exponential smoothing has as an assumption that the 
time series is basically constant in time and the difference 
from the mean is caused by some random noise. This does 
not appear to be the case with LOS populations or with 
transition rates. 

A recommended change to the model is to remove the 
complex SMOOTH and ADJSMO subroutines and replace them with 
the SM0TH2 subroutine, using an alpha of 0.3. It should be 
made clear to any intended user, however, that substantial 
changes in the enlisted force management would not be 
reflected in the prediction. The modelling approach should, 
in fact, be completely revised so that changes of this magni- 
tude can be accounted for. Since pay grade totals are used 
to drive the force structure, the model is aware of impending 
changes in direction. This information is not currently 
being used in loss prediction by NAPPE. 

In addition, the base matrix used in the Hosteller pro- 
cedure could be estimated more carefully. Based on these 
preliminary experiments, this could result in a much better 
estimate of force structure, and hence a more accurate budget 
prediction. The determination of LOS cell 1 population re- 
mains somewhat ad hoc, as does the choice of a smoothing 
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constant of 0.3. While this study has demonstrated that 
a simpler approach to the prediction can be successfully 
taken, all other alternatives have not been thoroughly 
investigated. 
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APPENDIX A 



This appendix is a rough documentation of the SMOOTH 
subroutine in the current NAPPE model. 

Let 



A. . , 



be the actual population for year i 
quarter j and LOS cell k 



For each LOS k = 1, 2 , 30 , calculate the transition rate 

for each year and quarter 

j ,k ~ '^i+1, j,k+l 

A. . , 

1/ D /k 

For each year i = 1, 2 , NYR-1 (NYR = last year of 

historical data) and each a = .05, .10, ..., .95, calculate: 




Let n=4i+j j=l,2, 3,4 

For each year then the predicted transition rate is 




PRED 



i+2 , a 



P 



1 

4i + l 
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The relative error in this prediction is then 
the sum of the differences between the predicted 
and actual transition rates for the year 



1 + 2 , a 



4 

= Z 

j=l 



TR. - P . . 

1+1 , 3 4i+l 



1 - TR.^, . 

1 + 1,3 



Go to the INLINE subroutine to choose the best a. 
INLINE Subroutine 

For each year i = 1, 2, NYR find the best a for 

predicting the following year. 

For each a = .05, .10, ..., .95 calculate 



EMIN, = ER. 

1 1 , a 



EMINo = Z ER^ 

2 l,a 



EMIN- = Z (ER^ . ) 

3 



Select the a v/hich gives the minimum value of EMIN^, 
EMIN 2 , EMIN^ and call them a*, a*, a*. 

Calculate 



SE, = Z (ER, *) 
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^ 2 

SE = E (ER *) 

t=l ' 2 Within the summation 

here, each a* is the 

one which was selected 

i 2 given year 

<.-j. 2 



Select the minimum of these three values and the 
* 

most recent a for that method is the a to be used 
to predict the following year. Call this value 



For each year i = 1, 2, NYR and each quarter of the 

year, make the prediction based on the transition rate; 



T- • V 
1,3 A 



A. 1 • V 

1-1,3 /k 



( 1.0 



PRED. 

1 



a 



) 

i 



Now make a similar prediction based on population in each 
cell. 

For each year i = 2, 3, ..., NYR and each a = .05, .10, 

. . . , .95 



P 



1 



8 



\,A,k+l 



Let n = 4i+j j = 1, 2, 3, 4 



P 



1 

n+1 



oA 



i , j ,k+l 



+ 



( 1.0 



a) 



P 



1 

n 
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PRED.^t = P . . 

1+1, a 4i+l 



The relative error in this prediction is then 



ER 



i+1 , a 



4 

E 

j=l 



^i+l,j,k+l ^ 4i+l 



^i+1, j ,k+l 



Go to the INLINE s\±>routine to choose the best a. For 
each year i = 2, 3, NYR and each quarter (the same 

value is predicted for all four quarters of a given year) , 
make the prediction based on cell populations 



P, , = PRED. 

i,k 
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APPENDIX B 



This appendix is a rough documentation of the ADJSMO 
subroutine in the current NAPPE model. 

Define the five methods or techniques used in the 
subroutine : 

this is the predicted value for 
year i, quarter j calculated as 
the transition rate based 
prediction from SMOOTH using 
the TOTALNAV data. 



1. P(l) . . = T. . , 

'1,3 i,3,k 



2. P(2) . . = T. . , 

i,D i,D/k 



this is the sum of the predictions 
from SMOOTH made as above using 
USN and USNR data. 



3. P(3) . . = P. , 

1/3 i/k 



this is the predicted value for 
year i, quarter j calculated as 
the population based prediction 
from SMOOTH using the TOTALNAV 
data . 






this is the sum of the predictions 
from SMOOTH made as above using 
USN and USNR data. 
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5. P(5), . 

1 / J 



BWT --Y + (1-0 - BWT) ^ 



this is a weighted average of 
1-4 where the value of BWT is 
the value which would have 
predicted best for the previous 
year. 

For each LOS k = 2, 3, 31 

For each year i = 1, 2, NYR calculate the cumulative 

square error for each method 1=1, 2, 3, 4, 5. 



TEP(I) . 

X 



i 4 
E I 
n=l j=l 



1 k ■ 

/ J- / J / ^ 

' A 

i,j,k 




For each year calculate the cumulative square error 
for all values of BWT = .45, .50, ..., .95 



i 4 A. . , - X. . (BWT) 2 

ET2(BWT). = Z Z (— LiJ ) 

^ n=l j=l i,j,k 



where 



X. .(BWT) = BWT 
1/3 



P (1) . . + P(2) . . 

^/3 la 



+ (1.0 - BWT) 
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Based on TEP select the best method, I, and based on 
ET2 select the best BWT which will then be used to calculate 
P(5) for the following year. 

Calculate the cumulative absolute error for the entire 
period using the method which was selected as best for each 
year. 



TER,^ 



NYR 4 
E E 
L=1 S=1 



A. . , - P(I , ) . . 

1/3 n-1 1,3 



if j fk 



where I , 
n-1 

was the best 
method for that 
year. 



Make the initial prediction for the first future year 



^YR+l,j,k+l ^^^NYR^NYR+1, j 



where I^^R 
the method 
selected best 
on the last 
year of data. 



Let IPG. . be the total force required 
1/3 

Calculate the average proportion of the total 
force in cell 1 for all years of data, call it PAV 
For each quarter to be predicted, calculate 



34 



ClAV = (PAV) (IPG) 
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CIP = IPG - 



These two values are the possible predictions for 



LOS 1. ClAV is based on the average proportion of the force 
in cell 1, while ClP is simply the difference between the 
total required force and the predictions for cells 2-31. 

Let 



There is a test in the model to ensure that this average 
is between the values which would have been calculated 
using the largest and smallest proportions of the total 
population in cell 1 over the entire data base. 

Take the difference between ClADJ and ClP and allocate it 
among cells 2-31 according to the total error which was 
calculated for predicting that cell using the best method. 



ClADJ 



ClAV + ClP 
2 





(ClP - ClADJ) TER^_^A^ . ^ 



ifjfk i,j,k 31 




Adjust cell 1 from these values 
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A. . , 
i/D /I 



IPG. . 
i/D 



31 * 

E A. . . 
k=i 



The same basic procedure is used for predicting additional 
future years (up to 10) and the value of continuing the 
discussion is questionable. 
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